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ABSTRACT 

This paper continues our development of non-parametric tests for analysing the completeness 
in apparent magnitude of magnitude-redshift surveys. The purpose of this third and final paper 
in our completeness series is two-fold; firstly we explore how certain forms of incompleteness 
for a given flux-limited galaxy redshift survey would manifest themselves in the ROBUST 
Tc and T^ completeness estimators introduced in our earlier papers; secondly we provide a 
comprehensive error propagation for these estimators. 

This work was initiated by Rauzy (2001) and then extended and developed by 
Johnston et al. (2007) {Completeness I) and Teodoro et al. (2010) {Completeness IF). Here 
we seek to consolidate the ideas laid out in these previous papers. In particular our goal is 
to provide for the observational community statistical tools that will be more easily appli- 
cable to real survey data. By using both real surveys and Monte Carlo mock survey data, 
we have found distinct, characteristic behaviour of the Tc and Ty estimators which identify 
incompleteness in the form of e.g. missing objects within a particular magnitude range. Con- 
versely we have identified signatures of 'over' completeness, in cases where a survey contains 
a small region in apparent magnitude that may have too many objects relative to the rest of the 
data set. Identifying regions of incompleteness (in apparent magnitude) in this way provides 
a powerful means to e.g. improve weighting schemes for estimating luminosity functions, or 
for more accurately determining the selection function required to employ measures of galaxy 
clustering as a cosmological probe. 

We also demonstrate how incompleteness resulting from luminosity evolution can be 
identified and provide a framework for using our estimators as a robust tool for constraining 
models of luminosity evolution. 

Finally we explore the error propagation for Tc and Ty. This builds on Completeness II 
by allowing the definition of these estimators, and their errors, via an adaptive procedure that 
accounts for the effects of sampling error on the observed distribution of apparent magnitude 
and redshift in a survey. 

Key words: Cosmology: methods: data analysis - methods: statistical - astronomical bases: 
miscellaneous - galaxies: redshift surveys - galaxies: large-scale structure of Universe. 



1 INTRODUCTION 

The notion that we have entered an era of precision cosmology 
has been consistently cited in cosmology publications for more 
than a decade (see (Turner 1998)). While many would support 
this statement when applied to CMBR measurements e.g. made 
by the Wilkinson Microwave Anisotropy Probe (WMAP - see e.g. 
Spergel et al. 2003; Larson et al. 20 11), in other areas, such as the 
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study of galaxy redshift surveys, the era of precision cosmology 
would appear to be approaching but will require improvement in 
both the quality and size of our datasets and (crucially) our statisti- 
cal toolbox before we can claim that it has truly arrived. 

Estimating the luminosity function (LF) remains a power- 
ful and popular probe of galaxy evolution (e.g. Norberg et al. 
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Rodighiero et al. 2010). However, its accurate determination 
remains one of the most fundamental statistical challenges in 
modern observational cosmology. 

The methodology employed to estimate the LF varies greatly 
throughout the literature. Some of these approaches have been 
non-parametric, i.e. adopting no specific parametric model for the 
LF, and have ranged historically from the classical number count 
test (e.g Hubble 1936; Christensen 1975), the Schmidt (1968) 
l/Knax estimator, the 0/$ method (e.g. Turner 1979), and the 
Lynden-Bell (1971) C^ method. Alternatively, there have also 
been several parametric methods developed, generally based on the 
Maximum Likelihood Estimator (MLE) method of (Sandage et al. 
1979) (STY), where a parametric form of the LF (most commonly 
that of Schechter 1976) is assumed. Thirdly, there exists a 'hybrid' 
method: the non-parametric counterpart of the MLE developed by 
Efstathiou et al. (1988), and often referred to as the Stepwise Max- 
imum Likelihood method (SWML). Of course we have not listed 
the many variations of these three broad approaches that have arisen 
as a result of a now thriving industry that has produced (and con- 
tinues to produce) the myriad of complex and tailor-made galaxy 
redshift surveys. For more detailed articles that trace the origins 
and development of this area of statistics the reader is referred to 
e.g. Johnston (2011); Binggeli et al. (1988); Willmer (1997) and 
Takeuchi et al. (2000) 

Recently the field of LF estimation has witnessed some sig- 
nificant fresh developments, where new approaches have emerged 
with the goal of placing the methodology on a more rigorous sta- 
tistical footing. Examples of these include: the semi-parametric ap- 
proach by Schafer (2007); Bayesian methods of Andreon (2006) 
and Kelly et al. (2008), the use of the copula^ by Takeuchi (2010) 
to construct the bivariate LF; and a non-parametric inversion tech- 
nique by Le Borgne et al. (2009) applied to galaxy counts. 

Two key fundamental assumptions that are common to almost 
all LF estimators and which are also crucial to the work detailed in 
this paper can be summarised as: 

(i) Separability between the luminosity function, (j){M), and the 
density function, p{z), probability densities; i.e. one assumes that 
the underlying joint distribution of luminosity (equivalently abso- 
lute magnitude) and redshift may be written as the product of their 
marginal distributions. 

(ii) Completeness in apparent magnitude up to a specified faint 
apparent magnitude limit or within specified bright and faint mag- 
nitude limits. 

For clarity, note that we define completeness in this context as the 
probability that a galaxy of apparent magnitude, m, is observable. 
In Johnston et al. (2007) (hereafter Completeness I) and 
Teodoro et al. (2010) (hereafter Completeness II) we discussed in 
some depth the relative merits of traditional apparent magnitude 
completeness tests - e.g. the classical number count test (Hubble 
1926) and the still widely used Schmidt (1968) F/Vmax test. In 
particular we highlighted that both tests are susceptible to bias 
when applied to a survey which is spatially inhomogeneous. More 
specifically, we noted that in practice it can be difficult to de- 
cide whether deviations from the expected value of the respec- 
tive test statistics are indeed the result of survey incompleteness 
in apparent magnitude, or are an artefact of galaxy clustering 



^ The copula is a function used to join multivariate distribution functions 
to their one-dimensional marginal distribution function and is particularly 
useful for variables with co-dependence. 



and/or evolution of the galaxy luminosity function. However, at 
least in the case of V^/V^nax (and its 1/Vmax counterpart), this 
issue has not diminished the widespread application and exten- 
sion of these tests - possibly due to their simple implementation 
(see .e.g. Huchra and Sargent 1973; Felten 1976; Avni and Bahcall 
1980; Hudson and Lynden-Bell 1991; Eales 1993; Qin and Xie 
1997, 1999; Page and Carrera 2000; Sheth 2007). Nevertheless, we 
believe that this should not deter us from developing better statis- 
tical tests that are less prone to such biases when the assumptions 
underpinning them are not fully satisfied. 

Efron and Petrosian (1992) (hereafter EP92) revisited the 
properties of the seminal Lynden-Bell (1971) C~ method for con- 
structing galaxy LFs. Their paper introduced a powerful new ap- 
proach to analysing magnitude-redshift surveys in which they pro- 
posed a non-parametric permutation test of the independence of 
the spatial and luminosity distributions of galaxies in a magnitude- 
limited sample. As with the C~ method, the EP92 test required no 
assumptions concerning the parametric form of both the spatial dis- 
tribution and the galaxy luminosity function. However, this estima- 
tor assumes the data under test is complete in apparent magnitude. 
They applied their permutation test to a quasar sample, with an as- 
sumed apparent magnitude limit, in order to robustly estimate the 
parameters characterising the luminosity distance-redshift relation 
of the quasars (see also Efron and Petrosian 1999). These permuta- 
tion test tools have since been adopted to explore e.g. evolutionary 
models for the dependence of Gamma Ray Burster (GRB) lumi- 
nosity with redshift (Lloyd-Ronning et al. 2002), correlations be- 
tween the luminosity of galactic nuclei and that of their host galaxy 
(Hao et al. 2005), and, more recently, the radio and optical lumi- 
nosity functions in Singal et al. (2011). 

Rauzy (2001) (hereafter ROl) extended the ideas of EP92 by 
adapting them to develop a simple but powerful tool for assessing 
the apparent magnitude completeness of magnitude-redshift sur- 
veys. As was the case with EP92 - and unlike the Hubble number 
counts or V/Vn-iax tests - the Rauzy test statistic T^ requires no 
assumption that the spatial distribution of galaxies is homogenous. 
Moreover, it also requires no assumption of a specific parametric 
form for the galaxy luminosity function. However, the Rauzy test 
was formulated only for the case of an assumed sharp, faint ap- 
parent magnitude limit. The Rauzy test has since been applied to 
a wide variety of data exploring e.g. the completeness limits of the 
HI mass function in the HIPASS survey (Zwaan et al. 2004), the HI 
flux completeness of ALFALFA survey data (Toribio et al. 2011), 
for nearby galaxies dwarf galaxies in the local volume (Lee et al. 
2009). A recent study by Devereux et al. (2009) adopted the Rauzy 
method to determine the completeness of multi-wavelength se- 
lected data. 

Completeness I extended the Rauzy test to the more realis- 
tic case of data characterised by both a faint and bright apparent 
magnitude limit. Moreover we introduced a new variant statistic, 
denoted Tu, that samples the cumulative distance modulus, Z, dis- 
tribution but retains similar robust properties to those of Tc - i.e. 
being independent of the spatial distribution of galaxies. By sam- 
pling the data in this way, the T„ statistic can be considered as an 
improved, differential version of the the widely used V/Vi-m,^ test 
(which assumes spatial homogeneity). 

Our next paper. Completeness II, highlighted the fact that the 
previously defined completeness estimators may be susceptible to 
'shot-noise' effects if studying completeness in a small parent data- 
set. This phenomenon can be explained in the context of Complete- 
ness I, where the basic construction of the Tc and r„ statistics al- 
located a volume limited subsample to each individual galaxy in 
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the catalogue. With the introduction of a secondary bright appar- 
ent magnitude limit (as illustrated in Figure 1) the size of the vol- 
ume limited subsample constructed for each galaxy may be sig- 
nificantly restricted. Completeness II showed how this may ren- 
der the Tc and T„ completeness estimators 'shot-noise' dominated 
when the volume limited regions are very sparsely sampled. As a 
result Completeness II proposed an extension of the basic Tc and 
Ti, completeness estimators facilitating their construction in a man- 
ner that essentially maintains a constant 'signal-to-noise' ((s/n)) 
level. 

This article examines more deeply characteristic forms of in- 
completeness and how they will be manifested in the resulting Tc 
and T„ estimators. We also further extend the tools developed in 
Completeness II, where we now introduce an error propagation 
analysis for the Tc and Tj, statistics. Hence we continue our task 
of providing a statistically rigorous, but practical, foundation for 
testing the magnitude completeness of current and future redshift 
surveys. 

The structure of this paper is as follows. In § 2 we briefly out- 
line the basic framework that has underpinned the development of 
the Tc and Tv statistics, from ROl through to the the signal-to-noise 
approach detailed in Completeness II. In § 3 we explore various 
forms of incompleteness through the use of real and simulated data. 
In § 4 we derive expressions for the error propagation of Tc and Tv . 
Finally in § 5 we discuss our findings and summarise our conclu- 
sions. 



for extinction) magnitudes; one can then add e.g. K- and evolu- 
tion correction terms in an incremental fashion, thus more readily 
determining the impact of each correction on the magnitude com- 
pleteness of the survey. Therefore, Z, is simply defined as. 



Z = 51ogio(rfL) + 25EEm-A/, 



(1) 



where Z is the distance modulus corresponding to redshift z, mis 
apparent magnitude corrected for extinction only and (11 is the lu- 
minosity distance. The absolute magnitude, Af , is then equivalently 
defined as. 



M ■ 



51ogio(dL) - 25 



(2) 



Note also that in what follows, for simplicity, we will suppress ex- 
plicit reference to the angular coordinates and work with the distri- 
bution of Z marginalised over / and b. 



2.3 The random variables, C,{mi) and r{mi) 

The Tc and Tv statistics are constructed from estimates of the ran- 
dom variables, C("^*) and r(m^^,) respectively (see Figure 1) which 
are defined in terms of the joint probability density in M and Z, 
constructed in a separable form. ttiI represents the trial faint appar- 
ent magnitude limit under test indicated by the red diagonal lines in 
Figure 1. Thus, following ROl and Completeness I, for each object 
i present in a catalogue we define 



2 BUILDING THE COMPLETE PICTURE 

In this section we briefly outline all the elements essential to 
defining our completeness estimators, as previously developed in 
Rauzy (2001) (ROl), Johnston et al. (2007) {Completeness I) and 
Teodoro et al. (2010) (Completeness II). The reader is referred to 
those articles for a more detailed discussion. 



2.1 The assumption of separability 

In ROl the foundations of our statistical method were established 
on the assumption of separability, whereby the luminosity func- 
tion of the galaxy distribution is considered to be independent of 
the the three-dimensional redshift space coordinates z = (z, I, b) 
of the surveyed galaxies, (where {I, b) are galactic coordinates). As 
we have already discussed in our previous papers, this is a rather 
restrictive assumption which, nevertheless, underpins most of the 
traditional completeness tests (and indeed most luminosity func- 
tion estimators as well) in the literature. In future work we will 
investigate methods to further adapt our completeness estimators, 
exploiting departures from the null hypothesis of separability as a 
sensitive probe of galaxy evolution. For the remainder of this paper, 
however, we will generally restrict our attention to cases where the 
assumption of separability is satisfied. 



2.2 Defining the fundamental variables, Z and M 

We begin by considering the uncorrected distance modulus Z. By 
uncorrected we mean (in the first instance) that K- and source evo- 
lution (e-) correction terms have been neglected. Although this ap- 
proach differs from that of ROl, where such corrections were in- 
tegral to the formulation in that paper, our subsequent experience 
of applying our completeness estimators has demonstrated that it is 
more instructive to begin with the raw (i.e. only initially corrected 
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and 



£ [n{ml) - 1/2] 



Tv{mi) 



E Var(n) 



1/2 



(4) 



where 



Umi) = 



F{M,) ~ F[M'^UZ. - 5Z)] 



and 



ri{m{) 



F[Ml,^{Z.)] - F[M^,JZ. - SZ)] 

n{Si) _ ri{mi) 
n{SiUS2) ~ ni(ml) + l' 



(5) 



■n{S-i) _ qijmi) 
n{SaUS4) ^ ti{mi) + l' 



5M)] 



(6) 



and n denotes the number of galaxies belonging to region Si, nt 
the number of galaxies belonging to Si U S2, Qi the number of 
galaxies belonging to 5*3, and ti the number of galaxies belonging 
to 53 U 54. F{M) is the cumulative luminosity function and H{Z) 
is the cumulative distribution function. The variance, Var(^i) and 
Var(ri), are respectively given by. 



Var(CO 



12 ni 



(V) 



© 2011 RAS, MNRAS 000, 1-17 



4 Johnston, Teodoro & Hendry 



40 



39 



'38 — 



tsi 
m 

^37 
O 

QJ 36 

o 

03 
m 35 



34 - 



33 



' 1 


, 


1 ' ' 


, 1 , , 


' 1 ' 


\. . 


:(Mpz:^ 


\ 

\ 

■ \ 




- 


/ 








\' \ 






\ 


<5Z X 


SI 


S2 


\ \ 

Y .\ 

\ \ 




- 


- 


\ ■ ' ' " 
\ 
\ . ■ 
\ 
\ 
\ 
\ 

\ 

\ 
\ 


■ \ ^ 


\ 

\ 
■ ■ \ 

w 


\ 


M[>„ 


\ 
\ 


\ '^Um 


"' 


\\mL " 


, 1 


1 1 1 1 1 


1 1 \ 1 


, 1 , , 


, l\\ 



-22 -20 -18 -16 -14 

Absolute Magnitude (M) 




-22 -20 -18 -16 -14 

Absolute Magnitude (M) 



Figure 1. Diagram illustrating the construction of the rectangular regions used to define the random variables, C,i(m\) (left panel) and Ti{m\) (right panel), 
for a typical galaxy at {Mi , Zi ) drawn from a survey that is subject to bright and faint apparent magnitude limits m-yj^j and Tifj^^ respectively. The left hand 
panel illustrates the construction of the regions Si and 52, related to the random variable Ci, for a given 'trial' value m^ of the faint apparent magnitude hmit. 
These regions are uniquely defined for a rectangular slice of fixed width, SZ, in distance modulus. The right hand panel illustrates the construction of the 
corresponding rectangular regions 53 and 54, related to the random variable r^, for a given tiial value of (m^). Similarly these regions are uniquely defined 
for a rectangular 'slice' of width, SM, in absolute magnitude. See text for further details. 



and 



Var(rO = 



1 U 



12U + 1 



(8) 



Figure 1 illustrates the construction of the rectangular regions 5*1 , 
52, 53 and 5*4 as well as the meaning and definition of the slices 
in magnitude, SZ, and distance modulus, 5M. It should be men- 
tioned that ri was also the notation used in EP92 to denote the rank 
of the object i when galaxies are sorted by magnitude. In the sce- 
nario illustrated in Figure 1 the quantities are SZ and SM are fixed 
quantities. Therefore, one can see that initial m, values close to the 
bright limit, m\i,^, where a chosen value of 5Z and SAI is too large 
for sufficient sampling, neither Tc or Tv will be calculated. More- 
over, as m« moves to increasingly fainter magnitudes, any objects 
too close to mfi^j for a Ci(^^) or r,(5Af ) calculation to be made 
will be omitted from the final completeness calculation. 



2.4 Signal-to-noise & shot-noise sampling 

In Completeness I we identified two effects that are a consequence 
of adopting fixed 'slice' widths SZ for Q and SAI for t;. Note that 
fixing these widths to predetermined values allows the construction 
of unique, separable regions, following Equations 5 and 6, within 
any survey with a well defined bright and faint apparent magnitude 
limit (usually referred to as doubly truncated). However Complete- 
ness II explored the sensitivity of our results to the (essentially ar- 
bitrary) choices of SZ and SM. It was identified that, 

(i) For very small values of SZ and SAI the respective Tc and 
r„ statistics will be dominated by what we may term 'shot-noise' 
(since the rectangular regions they identify are extremely sparsely 
sampled); this makes the process of drawing significant conclusions 



regarding the nature of the true faint apparent magnitude limit im- 
possible. 

(ii) Conversely, when the values of SZ and SAI are taken to be 
very large, then for data-sets that are not well described by a single, 
sharp faint end apparent magnitude limit mum one observes that 
the behaviour of the Tc and Tv statistics appears to be sensitive to 
values of SZ and SM - which may lead to inconsistent conclusions 
about the true faint magnitude limit. 

These effects prompted us in Completeness II to derive expressions 
for an effective signal-to-noise ratio (s/n) for Tc and r„, given re- 
spectively by. 




(50) 



Sr'' 



+ 



TiiUi + 1) 



and. 



(Sn) 



Sqf 



+ 



{rii + 1)^ 

[S{n, + 1)]^ 2Sr,[S{m + l)] 



q^{U + 1) 



{U + 1)" 

[5(i. + 1)1^ 2%[5(ti + l)] 



1/2 



1/2 



(9) 



(10) 



One can use these expressions to motivate a different method for 
choosing the widths SZ and SM, one that resembles an "adaptive 
smoothing" procedure: in short we allow the values of SZ and SAI 
to vary from galaxy to galaxy, so as to maintain a constant signal- 
to-noise ratio for the statistics Tc and Tj, - thereby seeking to main- 
tain the same amount of information, as measured by the signal-to- 
noise ratio, allocated to each galaxy within the separable regions 
that define our estimators of C, and r respectively. The appropri- 
ate minimum target signal-to-noise value was determined by trial 
and error, based on the SZ and SAI values for which the Tc and 
Ty Statistics would fall below the — 3cr level at the true apparent 
magnitude limit. 
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reader is referred to those papers). To summarise, the sample con- 
tains 7878 galaxies out to a limiting magnitude of tti = 20.0 mag 
in a redshift range 0.015 < z < 0.18. By applying a simple 
bootstrap-by-replacement algorithm we created 2 sets of a total of 
500 realisations of the parent catalogue, randomly removing 500 
then 2000 sources from each realisation. The resulting Tc and Tv 
curves are shown in Figure 2. On each panel the completeness test 
statistics for the full survey are shown in red. The grey lines repre- 
sent the test statistics for each bootstrap sample and the blue line 
shows the average of these test statistics. From the bottom panels 
we can clearly see that when one removes 500 sources at random 
the resulting Tc (left) and T^ (right) curves continue to trace the full 
survey data extremely well. In fact, the averaged line in blue almost 
completely overlaps the test statistic for the full survey data, show- 
ing no sign of any systematic departure from the original survey 
completeness result. In the top panels we extend this to randomly 
removing 2000 objects in each bootstrap sample, i.e. approximately 
a quarter of the parent dataset. Both Tc and Tv statistics show a 
broader distribution compared to the top panels, but overall the av- 
eraged values deviate only slightly from the original survey data. 
From this simple test we conclude that when the source removal is 
performed randomly the Tc and r„ estimators remain robust. 



Figure 2. The top panels show the resulting bootstrap analysis when, for 
each bootstrap sample, we uniformly randomly remove 2000 sources from 
the parent catalogue. Similarly the bottom panels show the resulting Tc and 
Ty curves when we randomly remove 500 sources from the parent cata- 
logue. On each panel is shown completeness results for the actual MGC 
survey (red dashed line). The bootstrapped samples (light grey lines), and 
the averaged bootstrapped samples (solid blue line). 



3 CHARACTERISING INCOMPLETENESS 

The Icey motivation for the original Rauzy completeness test, was 
to provide a simple non-parametric method to determine the true 
magnitude limit of a magnitude limited survey. The ROl and sub- 
sequent Completeness I and Completeness II papers demonstrated 
effectively how this is achieved. We now extend the use of our es- 
timators beyond probing just the faint limit of a survey. This sec- 
tion firstly demonstrates the robustness of our estimators before go- 
ing onto to explore the different forms of systematics to which Tc 
and Tv should be particularly sensitive, noting the characteristic 
features of the Tc and Tv curves that are the result of incomplete 
patches in a given survey or the signatures of luminosity evolution. 



3.1 Probing the robustness of Tc and Tv 

As we have already discussed, our completeness estimators assume 
separability between the luminosity function i3?(Af ) and the den- 
sity function p{Z). Therefore, the estimators should be sensitive 
only to systematic departures from this assumption and conversely 
they should be essentially insensitive to random departures from 
separability. In this section we demonstrate that this is indeed the 
case. 



3.1.1 Randomly removing sources 

First we introduce a random sampling of the underlying luminos- 
ity function. As with our previous studies in Completeness II and 
Completeness II we use the Millennium Galaxy Catalogue (MGC) 
to perform this analysis (for full details of the sample selection the 



3.1.2 Retaining Separability 

We can also demonstrate the insensitivity of Tc and T„ to the spatial 
distribution and luminosity function of the sampled objects. To do 
this we remove objects from slices of distance modulus (Z) and 
absolute magnitude (M) respectively. Figure 3 illustrates these two 
scenarios. 

By their construction both Tc and Tv should be insensitive to 
spatial inhomogeneities in redshift (or distance modulus). For illus- 
trative purposes we demonstrate this robustness in an extreme way, 
as shown in the top panels of Figure 3. The top-left panel set shows 
two cases where strips of galaxies in Z have been completely re- 
moved and the corresponding completeness has been re-assessed. 
In case (1) in the top left panel set, all galaxies have been removed 
between 36 < Z < 37. The resulting Tc and Tv completeness 
results show little deviation from the completeness results for the 
parent data-set. We then remove a second strip at 37.5 < Z < 38. 
The resulting Tc and T„ (dashed lines) statistics are once again not 
adversely affected. On the right-hand panel set we now randomly 
replace half the galaxies that were removed from each strip to ob- 
serve if any bias in the statistics is introduced. As is evident from 
the corresponding completeness curves, there is very little percep- 
tible change. 

In a similar way we can instead remove galaxies in strips of 
absolute magnitude, as we show in the bottom panels of Figure 3. 
First we remove a section from —17 < M < —18 in case (I) 
and then extend this region to M — —19 in case (2). Whilst there 
are slight differences in Tc and T„, compared with the results for 
the full datatset, toward the faint end, there is no overall systematic 
departure from completeness indicated. Once again, by randomly 
replacing in the absolute magnitude slices a fraction of the removed 
objects both estimators show no change - as can be seen in the 
bottom right panels. 

In this and the previous section we have demonstrated the con- 
ditions under which our completeness test will remain robust. In the 
following sections we now consider cases where on the other hand 
we expect the estimators to be very sensitive to incompleteness. 
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Figure 3. Retaining a separable distribution. Using the MGC data we can demonstrate the robustness of Tc and T^ to the removal of regions on the M-Z plane, 
in two distinct cases where a separable distiibution is retained. The four panels on the left show that removing either horizontal or vertical sections of the M-Z 
distribution does not adversely affect the Tc and T^ outcomes. Furthermore, the four right hand panels show that if we then randomly replace approximately 
80% of the objects within the removed regions, the statistics remain largely unaffected. This is a direct result of the construction of both Tc and T„ where they 
remain independent of any clustering effects. It is as a consequence of this robustness that the estimators are conversely very sensitive to systematic changes 
in the distribution of apparent magnitude (see Figure 4). 
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Figure 4. Introducing crude systematic effects: missing objects. In the top left panels we have completely removed objects in the range 16.5 < m < 18.5 
and 35.5 < Z < 37.5 from the original MGC survey data. The corresponding Tc and Tv statistics are shown in black and red respectively. In the bottom left 
panels we now randomly return approximately half of the galaxies that were removed. The red dashed Hne indicates the boundaries in apparent magnitude of 
this region. Whilst, 'by eye' the M-Z distribution in the bottom left panel may appear complete again, the Tc and Tv statistics are sensitive enough to detect 
the residual incompleteness resulting from the fact that only about half of the removed galaxies were reinstated. The top right panels show the superimposed 
Tc and T^ estimator curves for 100 mock catalogues, with the M-Z distribution for one of these mock catalogues shown in the rightmost panel. These mock 
catalogues were drawn from a Universal Schechter function and as such should show no signs of incompleteness. This is confirmed in the top right panels 
where grey dashed and dotted lines indicate the averaged values for Tc and Tv respectively (it should be noted that since both Tc and T„ both trace each other 
very closely, distinguishing between thek averaged values on the plot is difficult). In the bottom right panels we now take these mock catalogues and uniformly 
randomly remove from them some of the galaxies in 2 small ellipsoidal regions, as indicated by the two colours, blue and green of the sources that remain on 
the right-hand panel. This crude procedure amounts to the removal of approximately only 4% of each parent catalogue, but already we can see Tc and T„ are 
sufficiently sensitive that they show a characteristic, strong deviation over the relevant range of apparent magnitudes. The average of the Tc and Tv statistics 
in this case displays a similar structure to that of each of the mock catalogues. 
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3.2 Breaking separability 

We can now explore in more detail the circumstances under whiich 
tlie assumption of separability breaks down. This can help us char- 
acterise features observed in the completeness results which take 
us beyond the simple notion of finding the true apparent magnitude 
limit of a survey - in essence observing and quantifying systematic 
departures from completeness. Figures 4 and 5 demonstrate how 
basic forms of incompleteness, in apparent magnitude, will mani- 
fest themselves in Tc and Tv . 



3.2.1 Unobser-ved objects: 'under-completeness' 

In the first scenario we remove all galaxies from a section of the 
M-Z plane within a specified range of apparent magnitude 16.5 < 
m < 18.5 and a distance modulus range 35.5 < Z < 37.5 (Fig- 
ures 4 top-left panel set). From a parent set of 7847 galaxies, we 
have removed a total of 1048, leaving 6799 galaxies. As we can see 
from the M-Z distribution on the upper right-hand of this panel set, 
these limits bound a parallelogram 'slice' the removal of which thus 
introduces an artificial correlation between M and Z. Of course, 
this an unrealistic systematic effect, but it provides a useful, if ex- 
aggerated, example to demonstrate the characteristic behaviour of 
Tc and T„ when galaxies have not been observed in a given ap- 
parent magnitude region. We see that the removal of these galaxies 
produces consistent behaviour in both Tc and r„: up to m* = 16.5 
both estimators are consistent with sample completeness, as we 
would expect. However, as m, moves across the region of missing 
objects the estimators indicate significant incompleteness, dipping 
to a minimum of ~ — 7cr between 16.5 < m* < 17.9. As m, 
now passes beyond the 'incomplete' region, there is an immediate 
peak in the estimators at ~ 9cr between 18.5 < jn* < 19.2. Such 
a feature appears to be a characteristic for this form of incomplete- 
ness, and is indeed to be expected as a direct result of the relation 
between the separable regions in the construction of the C, and r 
random variables. In our rather extreme example we finally see the 
estimators drop below —3(7 before the 'true' apparent magnitude 
limit of the survey indicated by the bold vertical dashed line. We 
note at this point that in our completeness I paper a similar trend 
was observed in our initial analysis of the 2dFGRS data. 

In the bottom plots of the left-hand panel set of Figure 4 we 
now randomly return 538 galaxies, i.e. about half the removed 
galaxies, from the upper M-Z distribution. This results in a total 
subsample of 7309 objects. Although we now observe a much sup- 
pressed incomplete signal compared to the upper panels, the overall 
shape remains the same with the characteristic dip now observed at 
the — ~ 4 (T level followed by a peak at ~ 5.5 ct. We also note, 
however, that the ability to constrain the faint apparent magnitude 
limit appears no longer to be impeded by the incompleteness at 
brighter magnitudes. 

By using simple Monte Carlo mock catalogues we can probe 
this effect in a more controlled environment. We construct a to- 
tal of 100 mock catalogues, randomly drawing a total 10,000 ob- 
jects from a Schechter LF and adopting LF parameters as estimated 
in the MGC survey by Driver et al. (2005) over a similar redshift 
range and out to a limiting magnitude of 7Tifin,=20.0 mag. Since we 
have already shown our test statistics to be insensitive to clustering 
properties, for simplicity we have restricted our mock catalogues 
to have a uniform redshift distribution. In essence, our mock cata- 
logues represent magnitudes drawn from a Universal LF where no 
systematics should be present. The top plots in the right-hand panel 
set of Figures 4 show an example of an M-Z distribution (right- 



hand plot) and the corresponding Tc and T„ curves superimposed 
on each other for all 100 mock catalogues (left-hand plot). The grey 
dashed and dotted lines shown in the left-hand plot are the respec- 
tive averaged Tc and r„ values which are both consistent with zero 
up to the faint limit. Thus, it is clear from this plot that our mock 
catalogues are consistent with being complete up to the apparent 
magnitude limit of ?7ifi^=20.0 mag, as we would expect. 

However, for each mock catalogue we now randomly remove 
galaxies from two different ellipsoidal regions on the M-Z plane. 
These regions are highlighted in blue and green. From each mock 
this amounts to only a ~ 4% fraction of the total number in the 
parent set. The impact of this removal is shown in the bottom plots 
of the right-hand panel set of Figures 4. We see that removing 
just a small fraction of objects results in a strong, systematic de- 
parture from completeness caused firstly by the green ellipsoidal 
region. This feature is characterised by the dip in Tc and T^ at 
m, values that correspond to the green region, as indicated by the 
green vertical dashed lines on the completeness plot. The follow- 
ing characteristic rise in the estimators is suppressed slightly by 
the blue region as we observe a flattening of Tc and Ti, between 
16.6 < m» < 17.7 corresponding to this region of incompleteness 
and indicated on the figure by the blue vertical dashed lines. There 
is then an overall peak in the both estimators at an average value of 
^ 4.5(7 at 771*'^ 18.2, before both drop once again toward the faint 
limit. 

The key points to take from this demonstration can be sum- 
marised as follows. If objects have not been observed, for example, 
in some apparent magnitude range out to or within a given red- 
shift range, our completeness estimators will show a consistent and 
systematic characteristic shape that will typically lie outside the ex- 
pected statistical fluctuations. This shape takes the form of a drop 
in both Tc and r„ as 7n» moves across the incomplete region, fol- 
lowed by a distinct peak as 7n» moves toward fainter magnitudes 
and hence more objects. To test whether such a systematic effect is 
limited to a particular redshift range, one could of course split the 
M-Z distribution into a series of redshift slices to probe where it is 
most prominent. 

3.2.2 Too many objects: 'over-completeness' 

We now consider the opposite scenario where one observes an ex- 
cess of objects in an apparent magnitude bin relative to that ex- 
pected from the overall population of galaxies. Such a scenario, 
which we can refer to as over completeness, could be induced for 
example by magnification effects caused by gravitational lensing 
- i.e. objects that would have ordinarily have been too faint to be 
observed in a flux-limited survey are lensed into the catalogue. 

Again, for illustrative purposes, we randomly add objects in 
apparent magnitude to the survey sample, as shown in the left-hand 
panel set Figure 5. In the top plots, 500 objects have been uniformly 
randomly added within 16.55 < tti < 17.50 and within a redshift 
range 0.04 < z < 0.07. This distribution is highlighted by the blue 
points in the M-Z distribution. The resulting Tc and T, results are 
shown in the left hand panel. The following systematic features are 
observed for both estimators. As m, moves across the incomplete 
region a distinct peak is observed at the r^ 3(7 level over the range 
16.55 < m* < 16.8. This is then followed by a systematic drop, 
reaching the ~ —9(7 level, and finally another peak at the ~ 6(7 
level just before the estimators drop sharply at the magnitude limit 
of the survey. 

In the bottom panels of Figure 5 we have now added the same 
number of objects as before but to a narrow magnitude range of 
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Figure 5. Adding artificial systematics. The left-hand panel sets show the resulting Tc and Ti, curves when we add objects in a given apparent magnitude range 
to the MGC survey data. The top panels in this set have had 500 objects added randomly within the range range 16.55 < m < 17.5 and 0.04 < z < 0.07. In 
the bottom panels of this set the same number of galaxies were added in the range 18.0 < m < 18.5 and 0.015 < z < 0.079. In the right-hand panel set we 
have taken our 100 Universal mock catalogues and added to each a similar form of systematic deviation. The top panels of this set show Tc and T„ when 400 
objects have been randomly added in the range 19.0 < ra < 19.1 and 0.13 < z < 0.17. In the bottom panels we now add a further 200 galaxies between 
17.0 < m < 17.5 and 0.02 < z < 0.09. In both cases the averaged Tc values are shown as a grey dashed line. 



18.0 "^ rn < 18.5 and extending across a wider redshift range 
of 0.015 < 2 < 0.079. We observed a similar trend as with the 
previous example. Tc and T„ peaks at the ~ 6.5(7 level between 
18.0 < m* < 18.5, which corresponds to the incomplete region. 
We then observe a dip in the estimators beyond jn, > 18.8 at the 
~ — 6(T level for Tv and ~ — 9(t level for Tc. The estimators then 
show a slight rise toward the faint limit before dropping sharply at 
m. = 20.0. 

As a check in a more controlled environment we once again 
manipulate our mock catalogues in a similar way by adding objects 
to them in a manner that breaks separability: see the right-hand 
panel set in Figure 5. In the first example, we add 400 objects ran- 
domly in a narrow magnitude range 19.0 < m < 19.1 and within 
0.13 < 2 < 0.17. This case is highlighted in the M-Z distribu- 
tion in the top right panel in blue. We can see that both Tc and Tv 
spike within the targeted magnitude interval, followed by a shal- 
low dip at 771, ~ 19.5. In the bottom panels we now extend our 
example by randomly adding another 200 objects in the intervals 
17.0 < m < 17.5 and 0.02 < 2 < 0.09. Once again we can see 
that both Tc and T-u respond in a similar way to the addition of both 
systematic regions on the M-Z plane. We observe that the added 
systematic in this case marginally reduces the overall amplitude of 
the initial systematic spike at ?n*~ 19.1. 

In summarising this section, it is clear to see that over densities 
in apparent magnitude that break separability have the effect of pro- 
ducing a positive spike in the completeness estimators at apparent 
magnitudes corresponding to the 'over' complete region, followed 
by a dip at fainter magnitudes. This is, in essence, the opposite ef- 
fect to that seen when galaxies were artificially removed from the 
sample to represent incompleteness. Finally, one can expect to ob- 
serve a second rise in the estimator, the amplitude of which depends 
where the incomplete region occurs. 

Note that in both examples of artificially induced systematic 
effects investigated above, when they are added to the real survey 



data in Figure 5, the completeness estimators continue to show a 
sharp drop at the actual mum of the survey - albeit in the second 
case of 'over' completeness this occurs below the usual —2>a limit. 
The robustness of this effect was confirmed by our use of 100 mock 
catalogues. In reality of course, survey samples are likely to be sub- 
ject to varying degrees of incompleteness, manifesting themselves 
as features that may partially cancel out when incompleteness is 
probed across discrete magnitude bins. However, the illustrative ex- 
amples in this section indicate that one could adopt an appropriate 
weighting scheme to correct for these localised incompleteness fea- 
tures, where the weight assigned to each apparent magnitude bin is 
directly related to the value of the completeness estimators in that 
bin. We will investigate this approach in detail in a future paper. 



3.3 Revisiting the bright hmit case 

One of the key developments in Completeness I was extending the 
completeness test to account for the presence of a bright appar- 
ent magnitude limit in galaxy surveys. This development was moti- 
vated in part by the completeness results obtained when analysing 
a galaxy sample from the 2dFGRS. Through the use our mock cata- 
logues from the previous sections we now explore more deeply how 
the presence on an unmodelled bright limit affects the behaviour of 
the completeness estimators and how this behaviour relates to the 
results in Completeness II, where we showed that choosing a SM 
and 5Z that is too small can mask underlying incompleteness. 

In the left-hand panels of Figure 6 we have taken the mock 
catalogues drawn from a Universal Schechter LF and applied a suc- 
cession of three increasingly faint bright limits in apparent magni- 
tude at 7Ti|'i,^=15.0, 16.0 and 17.0 mag. For this demonstration we 
present only the results for Tc and note that r„ displayed the same 
behaviour. The red lines in the figure represent the results for each 
mock catalogue and the blue line in each panel shows their aver- 
age. In all cases Tc is calculated using the Rauzy estimator - i.e. no 
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Figure 6. Exploring the effects of imposing a bright apparent magnitude limit. The left- and right-hand panels show the same mock catalogues as used 
in Figure 4: the left hand panels show mock catalogues drawn from a Universal LF, and the right-hand panels show the same mock catalogues but with a 
systematic effect added to the apparent magnitude data. As in the previous illustrations, the red lines show Tc results for 100 mock catalogues and the dark 
blue hne in each panel shows the average value of Tc from these 100 mock catalogues. In each case we apply the Rauzy estimator to the data. From top to 
bottom we introduce successive cuts in apparent magnitude at increasingly fainter magnitudes, with the top panels showing the initial case of having no bright 
limit. The remaining panels are respectively cut at m^^^=\5.0, 16.0 &17.0 mag. The key result in this figure is the demonstration that the overall features of 
the completeness test remain largely unaffected, but do suffer a systematic downward shifting in Tc at apparent magnitudes brighter than the faint limit as we 
move from the case where there is no bright limit to a gradual well defined bright limit. 



bright apparent magnitude limit is modelled when computing the 
statistic, even when a bright limit is present in the data. The top-left 
panel shows the results when we do not impose any bright limit 
on the dataset and the mock catalogues are thus complete within 
and up to the faint apparent magnitude limit 77ifi„,=20 mag. As we 



begin to impose a bright limit we observe a systematic downward 
shift in Tc to increasingly negative values. Despite this behaviour, 
the sharp downward trend in Tc beyond the true faint apparent mag- 
nitude limit remains unaffected. Moreover, the presence of an un- 
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modelled bright limit does not induce on Tc any of the systematic 
characteristics shown in the previous sections. 

To explore this behaviour further we then applied the same 
bright apparent magnitude cuts to the mock data samples but this 
time with a systematically under-sampled region included in each 
mock catalogue, in the manner described in the previous section. 
This was achieved by randomly removing the single ellipsoidal re- 
gion corresponding shown in green on the right hand panel set of 
Figure 4. Again in all cases Tc is calculated using the Rauzy estima- 
tor. The top-right panel shows the results of the completeness test 
when no additional bright magnitude limit is imposed. The right 
hand panels below then show the results of imposing an increas- 
ingly deeper bright limit on the data. As we cut deeper in magnitude 
to 7n!'i,^=15.0 and 16.0 we can clearly see the same downward shift 
in Tc evident in the left hand panels. Perhaps more importantly, the 
systematic dip at 77i,~ 17 caused by the under-sampled region in 
the mock catalogues also remains. In the bottom right panel, with a 
bright apparent magnitude limit at 77111^=17.0, we see that Tc now 
exhibits the same trend as the in the corresponding left-hand panel. 
This is because the bright cut in apparent magnitude is in this case 
already fainter than the region where the systematic undersampling 
is introduced. 

We conclude, therefore, that if a survey sample has an un- 
modelled bright apparent magnitude limit then our completeness 
test statistics will be systematically shifted below — 3(t over a wide 
range of trial faint apparent magnitude limits until yn, moves be- 
yond the true faint limit of the survey - at which point the test 
statistics will again drop very sharply. Moreover, any localised sys- 
tematic features in Tc or Ty resulting from incomplete survey data 
will still be present. Thus, the characteristic incompleteness results 
which we obtained in Completeness I when we analysed the 2dF- 
GRS data would appear to be the result of not only an unmodelled 
bright magnitude limit, but also the presence of additional system- 
atic effects in the survey, of the form illustrated in § 3.2.1. 

3.4 A probe of luminosity evolution 

Source evolution in survey data can be thought of as another form 
of incompleteness. There are a number of methods available to 
constrain the statistical properties of evolution for a population of 
galaxies. Probably the most common approach is to parameterise 
evolution with a redshift z dependent model where either pure lu- 
minosity evolution (PLE) [e.g. L,{z) — L(Q)(1 + z)^} or pure 
number density evolution [e.g. (^* (z) = (j!)(0)(l + 2)^], oracombi- 
nation of both, is inferred from the estimated luminosity functions 
as a function of redshift. In the PLE case it is generally assumed 
that galaxies were brighter in the past where L* is the character- 
istic luminosity of the LF and fc is a galaxy type dependent evo- 
lution parameter. Similarly, with number evolution 7 is an evolu- 
tion parameter which assumes galaxies were more numerous in the 
past, where <j>t is the normalisation of the LF. It is common practice 
to constrain these models using a maximum likelihood estimation 
(MLE) technique, involving an assumed parametric form for the LF 
(see e.g. Saunders et al. 1990; Heyl et al. 1997; Springel and White 
1998; Groom et al. 2004; Wall et al. 2008). In both cases it is as- 
sumed when carrying out the MLE that the underlying evolution- 
ary model is the correct one to describe the entire population of 
galaxies in the sample under test. 

In this section we briefly show how our completeness estima- 
tors can be used as a probe of PLE without requiring any knowl- 
edge of the parametric shape of the LF. In a follow up paper we 
will demonstrate how we can adopt this approach to constrain the 



parameter(s) of a PLE model, although again without requiring any 
parametric model for the LF. 

For this initial study we draw upon the work of Croom et al. 
(2004) (hereafter C04) who constrained evolutionary models for 
high redshift quasi-stellar objects (QSO) over a broad redshift 
range. As is common with QSO studies they adopt a two power- 
law LF of the form, 

^(M z) = (ID 

where M* is the characteristic absolute magnitude, $* is the nor- 
malisation, and a and /3 determine the slopes of the respective 
power laws (see also Boyle et al. 1988). In C04 they then charac- 
terise evolution as a second order polynomial expressed in magni- 
tudes as. 



M*(2) = M*(0) - E{z) 



(12) 



Af*(0) -2.5(fciz + fc2Z ), 



where fci and fc2 are the evolution parameters which are analogous 
to the P evolution parameter described earlier in this section. In our 
study we use the C04 survey data as a guide to create mock QSO 
catalogues which have sufficient depth in redshift to probe evolu- 
tion. We provide only the details of these data which are pertinent 
to this study. For further details please refer to C04 and references 
therein. In this scenario we assume that the fc-correction is known. 
Such a correction is required since galaxies are observed at dif- 
ferent redshifts making use of single band filters thus sampling a 
fraction of the total spectrum. To compare the measurements at dif- 
ferent redshifts one will need to convert the observation to the rest 
frame of the object and therefore to correct for the finite size of 
the filter(s). In practice fc-corrections are approximated using two- 
dimensional polynomials as function of a redshift and observed 
colour. Effectively, fc-correction and PLE are degenerate and will 
impact in the overall Tc behaviour in a similar manner. Another 
subtle issue concerning the fc-correction is the way one defines 
the areas Si and 5*2 and the magnitude limits themselves: if the 
galaxies were first selected and then fc-corrected such limits would 
become "fuzzy" since different galaxy types and intrinsic colours 
would require different connections at a given redshift. However, the 
usual procedure is to apply a magnitude cut after fc-correction. Thus 
our implementation of Tc adopts a "top-hat" approach to partition 
the data-sets rather than using "fuzzy" magnitude limits. 

The parent data-set comprised of 23,338 QSOs within the ap- 
parent magnitude range 18.25 < bj < 20.85. The bright-end of 
the luminosity function was essentially extended with the introduc- 
tion of a further 1564 QSOs in the range 16.0 < 6,7 < 18.25 from 
the 6dF QSO redshift survey data-set. The redshifts were selected 
in the redshift range 0.4 < 2 < 2.1. In C04 the authors impose 
a further cut removing all sources M > —22.5. With this subset 
they then apply a maximum likelihood technique to simultaneously 
constraint the LF and E(z) parameters of Equations 1 1 and 12. The 
results of this analysis are reproduced in the top row of Table 1 . 



3.4.1 The no evolution case 

In the first step of our analysis we create a set of control samples by 
generating mock catalogues drawn from a Universal LF. Through- 
out we adopt the same cosmology as in C04 such that Qm = 0.3, 
Qa = 0.7 and Ho — 70 kms^^ Mpc^^. Essentially this means 
that we do not add the evolution term of Equation 12 and thus the 
assumption of separability between $(M) and p{Z) should remain 
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Table 1. Luminosity and evolution function parameters used to generate 
Monte Carlo mock magnitudes over the redshift range 0.4 < z < 2.1 
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Figure 7. Probing pure luminosity evolution (PLE). Top panel - Each of 
the 100 mock catalogues (grey lines) were drawn from a Universal (no evo- 
lution) two-power law luminosity function as described in the text. As we 
would expect from a mock catalogue without an evolving M* term in the 
luminosity function, all the mock catalogues show consistent completeness 
up to the faint apparent magnitude limit of m = 20.85 mag. Middle panel 
- we now generate absolute magnitudes for our mock catalogues from an 
evolving luminosity function, where M* is allowed to vary according to 
a redshift dependent PLE model taken from C04. This model effectively 
breaks the separability between M and Z and the resulting Tc curves for 
each mock catalogue show a characteristic drop below — Scr at approxi- 
mately m* ~ 19.6. The red line shows the completeness statistic for the 
actual C04 QSO data prior to applying any evolution correction. We can see 
the same characteristic drop-off, indicating the presence of evolution, for 
the real data. Bottom panel - Finally, we now coiTect our evolved mock data 
with the evolution function that was originally used to generate the magni- 
tudes from the evolving luminosity function. The grey lines, for the mock 
catalogues, now show results consistent with a complete sample. Similarly, 
correcting the C04 QSO data with their constrained evolution parameter 
values we can see their data is now consistent with completeness up to their 
published limiting magnitude of m = 20.08 mag. 



Details 



Mii,^ A/*(0) a /3 fci fca m/.^ 



C04 -22.5 -21.61 -3.31 -1.09 1.39 -0.29 20.85 

Universal LF - -22.8 -3.31 -1.09 - - 20.85 

Evolved LF - -21.61 -3.31 -1.09 1.39 -0.29 20.85 



valid for our control samples. We note that the constrained LF pa- 
rameters from C04 are derived from observed magnitudes that are 
subject to evolution. As Table 1 shows, we adopt the same LF pa- 
rameters as C04 with the exception of setting a shghtly brighter 
M* {Q) value of —22.8 mag to produce a more sensible magnitude 
distribution. 

For simplicity, we generate a uniform random sample of red- 
shifts in the range 0.4 < 2 < 2.1 throughout this study. In fu- 
ture work we will introduce clustering effects; however it should be 
noted that, by design, our completeness estimators are insensitive 
to clustering. Furthermore we do not impose any cuts in absolute 
magnitude. Thus, for an object at redshift Zi we randomly sample 
an absolute magnitude from the cumulative distribution function 
(CDF) of the adopted LF Equation 1 1 and compute the following, 






(13) 
(14) 



where m"amp denotes the sampled apparent magnitude from a uni- 
versal LF, Zi is the distance modulus and cLl is the luminosity dis- 
tance. The superscript imi refers to sampling from a Universal LF. 
Final selection of a galaxy must meet the following condition, 

ni"rmp(20 < m[in, (survey), (15) 

where ?n{i,n is defined by the C04 QSO survey limit. 

From this starting point we generate 100 realisations each 
comprising a total of 18662 objects. With no redshift dependency 
in the absolute magnitudes generated for our mock surveys, they 
should of course be magnitude complete. To verify this we com- 
pute the Tc statistic for each mock survey. The results are shown 
in the top panel of Figure 7, where we have superimposed each 
of the 100 Tc curves onto this plot, shown in grey. As expected, 
our results show no indication of incompleteness or other system- 
atic effects, with the Tc statistic for each mock catalogue dropping 
sharply beyond the apparent magnitude limit of 20.85 mag. 

3.4.2 The evolution case 

We now consider the case of an evolving LF. More explicitly, Equa- 
tion 1 1 now becomes. 



$(M, 



$* 



(16) 



lQOA{a+l)\M-M*{z)] _|_ lQO.i(P+l)[M-M'{z)] ' 

Thus, for each object located at Zi a unique CDF for M is generated 
from which a mock absolute magnitude is sampled. Note that we 
adopt the same redshift distribution as in the no-evolution case. 

Table 1 shows the parameters adopted for the LF and the evo- 
lution term. For clarity, in this scenario for each 'evolved' absolute 
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magnitude sampled, M^Zm-p^ we compute. 



Let us first re-define C^i for survey object i as function of m, : 



(17) 
(18) 



where the superscript ev reminds us that the magnitudes are drawn 
from an evolving LF. 

When we now compute Tc for our modified set of mock cata- 
logues we do not expect a result consistent with completeness. The 
middle panel of Figure 7 confirms this, showing in grey the result- 
ing Tc curves for this suite of mock catalogues. In clear contrast 
to the non-evolved case of the previous section we now observe 
a steady and systematic decline of Tc for each mock catalogue, 
with the average of the grey lines crossing the — 3cr threshold at 
m, ~ 19.6 mag, i.e. 1.25 mag. brighter than the true faint limit of 
the datasets. We note that there are no other signs of systematic de- 
parture from completeness observed between 16.0 < m« < 19.6. 

The red line on the middle panel shows the Tc statistic com- 
puted for the C04 observed QSO sample (uncorrected for evolu- 
tion). It is reassuring to note that we observe the same characteristic 
drop in Tc as with the mock catalogues, albeit at a slightly brighter 
m, value of 19.30 mag. We also observe a small spike in the C04 
Tc curve at m*~ 18.4 which may be due to the transition to the 
combined 6dF data. 



3.4.3 Correcting for evolution 

To conclude this section we now consider the effect of correcting 
the evolved mock samples with the evolution model used to gener- 
ate them. In this way we now compute. 



Mr^ = m::^^{z,) ^ E{z) 

yvcorr cv / \ 71 *-corr 



(19) 
(20) 



where the superscript corr represents the data corrected for evo- 
lution. In the absence of any other systematic effects, the correct 
evolutionary model applied as above should now render the M-Z 
distribution separable and should thus produce a test statistic that 
is once again consistent with completeness. If we now look at the 
bottom panel of Figure 7 we can see that this is indeed the case. By 
correcting the evolved M-Z distribution with the appropriate evo- 
lutionary model, the Tc curves are now consistent with the results 
obtained for the Universal case, in the top panels. 

Finally, we correct the actual C04 data in the same way. The 
resulting Tc statistic is shown in red in the bottom panel. We again 
see that the corrected C04 data are consistent with completeness up 
to the published apparent magnitude limit. 

In a follow-up paper we shall explore in more detail how we 
can use our completeness estimators to constrain PLE models by 
either assuming a parametric model of evolution or as a free-form 
technique. In both cases the important feature of our method is that 
no parametric form for the LF, or indeed for the spatial distribution 
of the sources, is required. 



4 Tc and r„ ERROR PROPAGATION 

Whilst the optimisation in Completeness II is useful for maximising 
the signal to noise ratio during the sampling process, a useful fur- 
ther step would be to understand the error on both Tc and Tv either 
for a given (s/n) level or when applying the ROl or Completeness 
II approaches. 



ro, 



Ci(m.) = < 



r^ (m^ ) 



for objects which one cannot 
compute ri{m,) and ni{m,). 

otherwise. 



(21) 



";(»". ) + l ' 

recalling that Vi is the number of objects in Si and rii is the number 
of objects in Si U S'2. 

Thus from the expression for Tcijn,), Equation 3, the error 
propagation can then be shown to be. 



Ar,(m.) = £ 






AC^ _^ r.(m,)An.(m,) a^^^ 
H(7n*) 2 .=,(771,) ani 



= V T^-^ {AO(m.) -f *,(m*)An,(m*)} (22) 



where 



1/2 



l{m.) = \ £var[0(m.) 



and 



*i(m* 



Tc{mt 



dn. 



■Var[C.(m.)] 



Thus the covariance between two T^'s corresponding to two mag- 
nitude limits, 771, and tti^, is given by, 
(Ar,(m.)Ar,(m',)) = 



E- 



H(m,) H(7nt 



{(A0(m,)AG(7n:))-f 



+ (An,(m*)Anj(7n',)) x *i('^*)*j('^'.)- 

- (A77i(mH,)ACj(m',)) x *i(m.)- 

- (Ani(m',,)ACj(m.)) x •^^{m',)) . 



(23) 



The last three terms of the above expression contain factors 
of the random variable C,j — 1/2 [via '^i{m'^.)]. In the case that 
777, 7^ 777 (j, these will lead to very small contributions to the overall 
covariance, since i^j(r77,) — 1/2 follows a uniform distribution cen- 
tred around zero. However, if Cj is not uniformly distributed on the 
interval [0, 1] then the terms containing C,j — 1/2 will not necessar- 
ily vanish. This scenario will occur, for example, with the breaking 
of separability discussed in section 3. Thus for a complete data-set 
and for m* 7^ 771', (where 777, and 771', are both brighter than Tnfi^ , 
the true faint limit of the survey) the covariance matrix acquires a 
rather simple form: 



(Arc(777,)Are(777',)) = 12 ^ 



(AC.(7n,)AG(777:)) 

[iVgal (777. )iVgal (777'. )]''"' 



(24) 



Here we are also assuming that 77; is large enough and conse- 
quently =(777.) -^ [7Vgal(777,)/12]^''^ where iVgal(777.) 7^ ATga,! 

since only objects brighter than 771, are used to compute Tc(r7i«). 
The contributions from the individual objects j and i are shown 
schematically in the right panel of Figure 8. From Equation 9 we 
can see that the computation of errors in the random variable C,i re- 
flects the Poisson fluctuations of counting objects within areas 5*1 
and 5*1 U S'2. Namely, the rank ri and 77^, respectively. 

As for the case where 777, = 777J, (see the middle 



© 2011 RAS, MNRAS 000, 1-17 



Completeness III 1 3 



\ (M„Z,) \ 


, 1 ' 1 ' 


SI 

< III, 


\s2 


1 , 1 : , \ 



-32 -31 -30 -19 -18 -17 -16 -15 -14 
Absolute Magnitude (M) 




-20 -1 

Absolute Magnitude (M) 



-20 -18 -16 -14 

Absolute Magnitude (M) 



Figure 8. Illustrating the manner in which we account for the correlations between certain C, calculations. Left panel: an illustrated complete data-set with well 
defined cuts in apparent magnitude at both the faint and bright end. Applying the traditional Rauzy (ROl) approach, one can see that the 51 and 52 regions 
are no longer separable due to the presence of the bright limit. Middle panel: in this example we have shown two arbitrary regions that would be used to 
estimate Qi and C,j for respective galaxies located at (Mi , Zi) and (Mj , Zj ) for a given m* . As such both galaxies form the respective regions [51; , 52^] and 
[Slj, 52j]. We observe by the coloured regions that there is a correlation introduced into the overall f calculation where these regions overlap. That is, C,i and 
C,j are not independent. In this particular case we have the following scenarios where: [51; U Slj] (red region), [52; U 51j] (green region) and [52; U 52j] 
(blue region). Right panel: we now consider correlations when m,7^ m'^. That is, we compare the (A/;, Zi) region for m, and the subsequent (Mj, Zj) 
region for a fainter m» value denoted by m'^ . 



panel of Figure 8), the contribution of the term containing 
(Ani(mt)Anj(7n«)) will be restricted to i = j. The last two 
terms of Equation 24 will still vanish for a large enough number 
of objects in the catalogue insofar as the catalogue is complete and 
m* is brighter than the survey faint limit. The contributions from 
the individual objects j and i are shown schematically in the middle 
panel of Figure 8. 



5 DISCUSSION & FUTURE WORK 

In the first part of this article we examined more closely the con- 
ditions under which systematic effects in survey data will manifest 
themselves in our completeness statistics. To demonstrate this ef- 
fectively, we firstly probed the conditions under which our com- 
pleteness estimators remained robust to changes in the data. Using 
actual survey data we uniformly randomly removed objects from 
the catalogue via a bootstrapping approach. In a second test, we re- 
moved slices of data in either Z or M. This too resulted in no sig- 
nificant change in both Tc and Ti, . Therefore, in both cases where 
separability between M and Z was retained, Tc and T^ were shown 
to be robust. 

Systematic effects in m were then explored for three cases 
which can be summarised as follows: 

(i) Under-completeness: This may arise when objects have 
not been observed over a particular redshift range and within 
apparent magnitude bins. Through the use of real and mock data, 
we demonstrated that if this effect is present at even the level 
of a few percent, a characteristic signature will be observed in 
Tc and r„, manifest as a significant drop in both estimators for 
trial apparent magnitude limits that lie within the incomplete re- 
gion(s), followed by a steep rise in Tc and r„ for fainter m, values. 

(ii) Over-completeness: Here there may be more objects 
observed in a particular apparent magnitude range relative to the 
underlying distribution. In this case Tc and T^ are observed to 



display the opposite characteristics to those of case (i). Thus, 
as m, moves across the incomplete region on the M-Z plane, a 
distinct peak in both Tc and T^ occurs followed by a distinct dip as 
m* moves back into the 'complete' region. 

(iii) Evolution: For the particular case of luminosity evolution 
studied here, we demonstrated that mock magnitudes which are 
drawn from an evolving LF will show a characteristic and sustained 
drop in Tc below —3(7 at an 7n, value significantly brighter than the 
actual apparent magnitude limit of the sample. Applying the correct 
evolutionary model to both M and Z, corrects this form of incom- 
pleteness. This characteristic behaviour for Tc was confirmed with 
the use of real QSO survey data from C04 which is known to dis- 
play strong evolutionary properties. 

We note, however, that in cases (i) and (ii) we used a highly arti- 
ficial scheme to introduce systematics to the data. In reality, one 
might expect some combination of these incompleteness effects to 
be present in the data, which may therefore partially cancel out. 
Nevertheless one might adopt, for example, an iterative weighting 
scheme to correct for any residual signatures in Tc or r„ . This ap- 
proach will be investigated in a later paper in the context of esti- 
mating the galaxy luminosity function. 

In the first part of this paper we also revisited the impact on Tc 
or Ti, of surveys that have an unmodelled secondary bright apparent 
magnitude limit. The purpose of this was also to revisit our initial 
completeness results for the 2dF survey which led to the extension 
of Tc for the bright limit case. We showed that imposing a suc- 
cessively fainter, but unmodelled, bright magnitude limit resulted 
in a systematic downward 'shifting' in the value of the complete- 
ness statistics. As the bright limit became fainter, the systematic 
shifting became more negative. We also showed that for the mock 
catalogues with a small systematic perturbation added to the appar- 
ent magnitude distribution, the systematic dip in Tc corresponding 
to the affected region also remained and was, if anything, more sig- 
nificant. Thus, reconsidering our 2dF results from Completness I, 
we surmise that the systematic dip observed in the Tc and Tv statis- 
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tics was more likely the result of some other underlying systematic 
effect not related to the bright limit of the survey, but which was 
then subsequently masked by our choice of naiTow width in SZ 
and 5M, as explored in Completeness II. 

In future work we will also explore how the properties of our 
random variables (^ and r can be exploited to constrain luminos- 
ity evolution in galaxy surveys. In essence, ( and r are powerful 
probes for identifying residual correlations in M and Z due to 
evolution. Thus, to constrain evolutionary models, we can extend 
our methodology to include e.g. the Kullback-Leibler divergence 
(KuUback and Leibler 1951) relative entropy method, which mea- 
sures the difference between two probability distributions p and q, 
where p represents the observed distribution of the data ((" or r in 
our case) and q represents our theoretical model for that distribu- 
tion (i.e. after application of a specific correction for luminosity 
evolution). 

In the final part of this paper we presented a full error prop- 
agation and covariance analysis for the Tc and r„ statistics, fur- 
ther developing the "adaptive smoothing" procedure introduced in 
Completeness II. 

By addressing all these issues we believe that we have laid 
a very comprehensive foundation for testing magnitude complete- 
ness limits that will become crucial for the next generation of red- 
shift surveys. 
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Figure A-2. The bottom panel shows the MGC redshift distribution (black) 
compared to a 1000 Monte Carlos (MC) simulated distributions (red). The 
simulated redshifts were randomly drawn from the CDF of the observed 
distribution. However, for consistency with the sampled magnitudes, the 
density of galaxies equalled that contained within the z-slices detailed in 
Figure A- 1 . The top panel shows the corresponding normalised residuals. 
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Al APPENDIX: CREATING MONTE CARLO SURVEY 
SIMULATIONS 

To investigate error propagation for the T^ and Tv estimators it 
was useful to generate realistic Monte Carlo (MC) simulations of 
a galaxy survey. Clearly there are several approaches to generat- 
ing such MC simulations. A popular method is to utilise numerical 
cosmological simulations (see e.g. Cole et al. 1998; Norberg et al. 
2002). This procedure can be summarised as follows: 

(i) Randomly sample dark matter halos from a cosmological N- 
body simulation, generated e.g. in a ACDM framework; 

(ii) Employ an algorithm to 'assign' galaxies to the sampled 
dark matter halos (see e.g. Berlind and Weinberg 2002) in a man- 
ner that mimics the clustering and the luminosity distribution of the 
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Figure A-1. Schematic representation of liow we sampled magnitudes to produce our Monte Carlos simulations. The left-hand panel shows the MGC M-z 
distribution. The horizontal lines indicate the slices in redshift of equal density within which a CDF of the absolute magnitudes is created. However, it should 
be noted that the length of each line is merely showing the extent between the brightest and faintest galaxy at that boundary. The right-hand panel shows the 
corresponding CDFs from which a mock magnitude is sampled. The red line indicates the CDF for the initial redshift bin and the green line, the final bin. 
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Figure A-3. The bottom panels show the MGC magnitude distributions (in 
blue) compared to a 1000 Monte Carlos (MC) simulated distributions (in 
red). These MC absolute magnitudes were drawn using an inverse sampling 
technique that utilised the MGC survey data (see Figure A-1). The abso- 
lute magnitudes were then converted to apparent magnitudes, m, as shown 
on the right-hand panel. The top panels are the coiTesponding normalised 
residuals between the MCs and the survey data. 



survey under study. Clustering statistics are usually quantified via 
e.g. the spatial two point correlation function, while luminosities 
are typically drawn from e.g. a Schechter function, with parame- 
ters, M,, Q, and $, which are inferred from observations of the 
real survey and should represent the present day luminosity func- 
tion of the target galaxy population; 

(iii) Repeat the above steps as many times as required, selecting 
the sampled 'galaxies' that are consistent with the survey selection 
function - at which point an apparent magnitude is generated to be 
consistent with the redshift of the galaxy derived from the numeri- 
cal simulation. 

In this paper, however, and in the spirit of our desire to develop 
and apply non-parametric methods, we have employed a rather 
simpler prescription for generating mock data that retain the same 
statistical properties as the real survey under study. Specifically, 
we generate galaxy redshifts and magnitudes by sampling directly 
from the observed cumulative distribution of these variables in the 
real galaxy survey". In addition to its robustness, such an approach 
has the obvious advantage of not requiring the scale of computing 
power necessary for cosmological N-body simulations - although 
one recognised limitation is that the method lacks scope for gener- 
ating multiple realisations (or sampling from multiple locations in 
a single realisation) in order to mitigate the impact of cosmic vari- 
ance - which may be particularly important in smaller volumes at 
high redshift (see e.g. Somerville et al. 2004). Notwithstanding the 



^ Note that, in order to simulate accurately the spatial clustering inherent 
in the real galaxy survey, ideally our mock surveys should also mimic the 
observed angular coordinates of the real survey. While this step would be 
quite straightforward to implement, since in this paper we make no further 
use of any directional information we do not perform it for the mock galaxy 
surveys presented here 
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above limitation, our robust method simulates the observed mag- 
nitude and redshift distribution extremely effectively. We now de- 
scribe the steps involved in our procedure for the specific example 
of the MGC survey. 

We want our mock surveys to reproduce as accurately as pos- 
sible the observational selection effects to which the real survey 
data are subject. Of course those selection effects are manifest in a 
plot of the (uncorrected) M — Z distribution for the MGC galax- 
ies; this is shown in A-1 and we clearly see that the bright and faint 
apparent magnitude limits render the distribution of absolute mag- 
nitude for observable galaxies strongly dependent on redshift. We 
explicitly include this dependence as follows: 

(i) We divide up the redshift distribution into a series of redshift 
bins, each containing an equal number of galaxies. The boundaries 
of these bins are indicated by the horizontal lines in the left-hand 
panel of Figure A- 1 . 

(ii) For each redshift bin we compute the sample CDF of abso- 
lute magnitude for observable galaxies within that bin. These sam- 
ple CDFs are shown in the right-hand panel of Figure A-1, where 
we see that they do indeed clearly vary with redshift bin, as ex- 
pected. The sample CDFs for the first redshift bin and last redshift 
bin are denoted by the red and green curves respectively. 

(iii) We also compute the sample CDF of redshift for the entire 
MGC survey. 

(iv) We then generate redshifts for our mock survey by repeat- 
edly sampling random redshift values from the sample CDF which 
we computed at step 3, checking as we do so that the number of 
mock galaxies sampled in each of the bins shown in the left-hand 
panel of Figure A-1 matches closely the number of real MGC 
galaxies found in that redshift bin. We carry out our sampling using 
the well-known "inverse sampling" method, based on the probabil- 
ity integral transform, as described in e.g. Section 7.2, Page 287 of 
Numerical Recipes (Press et al. 1996). 

(v) Finally for each mock galaxy we assign a corresponding 
absolute magnitude by first identifying to which redshift bin the 
galaxy belongs and then drawing a random value from the appro- 
priate sample CDF of absolute magnitude which we computed at 
step 2. Again we use the inverse sampling method to generate ran- 
dom absolute magnitudes. 

The bottom panel of Figure A-2 shows the resulting redshift his- 
togram for our MGC mocks shown in red, compared to the survey 
distribution shown in black. For continuity with our previous stud- 
ies in this series of papers we have adopted the same redshift limits 
as in Completeness II where Zmin = 0.013 and Zma.^ ~ 0.18. As 
a quality check we show the normalised residuals, r, for each bin 
which were computed according to the following relation. 



Assuming that our residuals are Gaussian (which should follow 
from the central limit theorem) we observe that they lie well within 
the indicated limits [—3, 3], suggesting that our sampled redshifts 
are consistent with the survey data. 

The mock survey absolute magnitudes were assigned with the 
extra condition imposed such that they must correspond to a galaxy 
that would be observed within the faint apparent magnitude limit of 
the survey, Jnf;,^. Therefore, each sample absolute magnitude was 
first generated, then converted to an apparent magnitude, m, via the 
simple relation. 



nii 



M,+Z„ 



(A-4) 



where Zi is the distance modulus derived from the simulated red- 
shift, and thus given by. 



Z, = 51og(dLj + 25, 



(A-5) 



where in turn di- is the luminosity distance (in Mpc) of the i* 
galaxy, i.e. 



di. ={l + z,)( — 



dz 



\J{1 + Ziy^Q.mO + ^AO 



(A-6) 



in which, to be consistent with Completeness II, we have set the 
present-day matter density fimo = 0.3, the cosmological constant 
term JIao = 0.7, and the Hubble constant Hq = 100 kms~^ Mpc~^ . 
Note that for simplicity we do not correct for k- or evolutionary 
effects, and thus our sampled distribution essentially represents the 
raw absolute magnitudes. This simplification does not affect our 
ability to probe the error propagation of our completeness estima- 
tors. 

The resulting sampled distributions for A/ and m are shown 
in red on the respective bottom left and right panels of Figure A-3. 
As with redshift, we have calculated the normalised residuals for 
both magnitude distributions; these are shown in the top panels of 
the figure. We find them to be within acceptable limits of [—3, 3]. 

Finally, in Figure A-4 we apply the ROl completeness esti- 
mators to the mocks and compare them to the completeness results 
from the actual MGC data originally calculated in Completeness II. 
The Tc and Tv superimposed results for the 1000 mock catalogues 
are shown in grey, with Tc on the left panel and Tv on the right 
panel. The MGC survey results are shown in red and the averaged 
Tc and T^ values for each nit for the mocks is indicated by the blue 
line. It is interesting and reassuring to observe that the averaged Tc 
and Tv curve of the mocks agrees remarkably well with the sur- 
vey data results. Of course this offers another method by which one 
could probe for systematics and/or inconsistencies in any procedure 
that is adopted to generate Monte Carlo galaxy samples. 



1=1 

\/Var(zf«) '' 



(A-1) 



where n = 1000 is the number of mock surveys, z^ represents 
each mock and Y the MGC survey data for each bin. Thus, the 
mean z^ and an unbiased estimate of the variance, Var(2™°*), 
is given by. 



MC 



-MC 1 v^ 

i=l 

Var(z-) = ^E(^ 



(A-2) 



(A-3) 
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Figure A-4. Tc and T^ statistics applied to tlie 1000 MGC mocks (grey lines). The completeness results from the actual MGC survey are shown in red and the 
averaged mocks are shown in blue. The faint apparent magnitude limit, ni[j,^, of the MGC survey is indicated by the vertical dashed line at m[jj^=20.0 mag. 
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